移动屏幕的布局是UI设计研究和对屏幕的语义理解的关键数据源。但是,现有数据集中的UI布局通常是嘈杂的,具有与其视觉表示的不匹配,或者由难以分析和模型的通用或应用特定类型组成。在本文中,我们提出了使用深度学习方法的粘土管道,用于去噪UI布局,允许我们在比例下自动改进现有的移动UI布局数据集。我们的管道采用屏幕截图和原始UI布局,通过删除不正确的节点并向每个节点分配语义有意义的类型来注释原始布局。为了实验我们的数据清洁管道,我们根据来自Rico的截图和原始布局,创建59,555个人注释的屏幕布局的粘土数据集,该网站上是一个公共移动UI语料库。我们的深度模型可实现高精度,F1分数为82.7%,用于检测没有有效的视觉表示的布局对象,85.9%用于识别对象类型,这显着优于启发式基线。我们的工作为创建大规模高质量的UI布局数据集提供了用于数据驱动的移动UI研究的基础,并减少了手动标签的需要,这些努力非常昂贵。
translated by 谷歌翻译
View-dependent effects such as reflections pose a substantial challenge for image-based and neural rendering algorithms. Above all, curved reflectors are particularly hard, as they lead to highly non-linear reflection flows as the camera moves. We introduce a new point-based representation to compute Neural Point Catacaustics allowing novel-view synthesis of scenes with curved reflectors, from a set of casually-captured input photos. At the core of our method is a neural warp field that models catacaustic trajectories of reflections, so complex specular effects can be rendered using efficient point splatting in conjunction with a neural renderer. One of our key contributions is the explicit representation of reflections with a reflection point cloud which is displaced by the neural warp field, and a primary point cloud which is optimized to represent the rest of the scene. After a short manual annotation step, our approach allows interactive high-quality renderings of novel views with accurate reflection flow. Additionally, the explicit representation of reflection flow supports several forms of scene manipulation in captured scenes, such as reflection editing, cloning of specular objects, reflection tracking across views, and comfortable stereo viewing. We provide the source code and other supplemental material on https://repo-sam.inria.fr/ fungraph/neural_catacaustics/
translated by 谷歌翻译
Analogical proportions compare pairs of items (a, b) and (c, d) in terms of their differences and similarities. They play a key role in the formalization of analogical inference. The paper first discusses how to improve analogical inference in terms of accuracy and in terms of computational cost. Then it indicates the potential of analogical proportions for explanation. Finally, it highlights the close relationship between analogical proportions and multi-valued dependencies, which reveals an unsuspected aspect of the former.
translated by 谷歌翻译
Recent advances in self-supervised visual representation learning have paved the way for unsupervised methods tackling tasks such as object discovery and instance segmentation. However, discovering objects in an image with no supervision is a very hard task; what are the desired objects, when to separate them into parts, how many are there, and of what classes? The answers to these questions depend on the tasks and datasets of evaluation. In this work, we take a different approach and propose to look for the background instead. This way, the salient objects emerge as a by-product without any strong assumption on what an object should be. We propose FOUND, a simple model made of a single $conv1\times1$ initialized with coarse background masks extracted from self-supervised patch-based representations. After fast training and refining these seed masks, the model reaches state-of-the-art results on unsupervised saliency detection and object discovery benchmarks. Moreover, we show that our approach yields good results in the unsupervised semantic segmentation retrieval task. The code to reproduce our results is available at https://github.com/valeoai/FOUND.
translated by 谷歌翻译
We propose a new self-supervised method for pre-training the backbone of deep perception models operating on point clouds. The core idea is to train the model on a pretext task which is the reconstruction of the surface on which the 3D points are sampled, and to use the underlying latent vectors as input to the perception head. The intuition is that if the network is able to reconstruct the scene surface, given only sparse input points, then it probably also captures some fragments of semantic information, that can be used to boost an actual perception task. This principle has a very simple formulation, which makes it both easy to implement and widely applicable to a large range of 3D sensors and deep networks performing semantic segmentation or object detection. In fact, it supports a single-stream pipeline, as opposed to most contrastive learning approaches, allowing training on limited resources. We conducted extensive experiments on various autonomous driving datasets, involving very different kinds of lidars, for both semantic segmentation and object detection. The results show the effectiveness of our method to learn useful representations without any annotation, compared to existing approaches. Code is available at \href{https://github.com/valeoai/ALSO}{github.com/valeoai/ALSO}
translated by 谷歌翻译
Deep learning has emerged as an effective solution for solving the task of object detection in images but at the cost of requiring large labeled datasets. To mitigate this cost, semi-supervised object detection methods, which consist in leveraging abundant unlabeled data, have been proposed and have already shown impressive results. However, most of these methods require linking a pseudo-label to a ground-truth object by thresholding. In previous works, this threshold value is usually determined empirically, which is time consuming, and only done for a single data distribution. When the domain, and thus the data distribution, changes, a new and costly parameter search is necessary. In this work, we introduce our method Adaptive Self-Training for Object Detection (ASTOD), which is a simple yet effective teacher-student method. ASTOD determines without cost a threshold value based directly on the ground value of the score histogram. To improve the quality of the teacher predictions, we also propose a novel pseudo-labeling procedure. We use different views of the unlabeled images during the pseudo-labeling step to reduce the number of missed predictions and thus obtain better candidate labels. Our teacher and our student are trained separately, and our method can be used in an iterative fashion by replacing the teacher by the student. On the MS-COCO dataset, our method consistently performs favorably against state-of-the-art methods that do not require a threshold parameter, and shows competitive results with methods that require a parameter sweep search. Additional experiments with respect to a supervised baseline on the DIOR dataset containing satellite images lead to similar conclusions, and prove that it is possible to adapt the score threshold automatically in self-training, regardless of the data distribution.
translated by 谷歌翻译
A paper of Alsinglawi et al was recently accepted and published in Scientific Reports. In this paper, the authors aim to predict length of stay (LOS), discretized into either long (> 7 days) or short stays (< 7 days), of lung cancer patients in an ICU department using various machine learning techniques. The authors claim to achieve perfect results with an Area Under the Receiver Operating Characteristic curve (AUROC) of 100% with a Random Forest (RF) classifier with ADASYN class balancing over sampling technique, which if accurate could have significant implications for hospital management. However, we have identified several methodological flaws within the manuscript which cause the results to be overly optimistic and would have serious consequences if used in a clinical practice. Moreover, the reporting of the methodology is unclear and many important details are missing from the manuscript, which makes reproduction extremely difficult. We highlight the effect these oversights have had on the result and provide a more believable result of 88.91% AUROC when these oversights are corrected.
translated by 谷歌翻译
在这项工作中,我们探讨了对物体在看不见的世界中同时本地化和映射中的使用,并提出了一个对象辅助系统(OA-Slam)。更确切地说,我们表明,与低级点相比,物体的主要好处在于它们的高级语义和歧视力。相反,要点比代表对象(Cuboid或椭圆形)的通用粗模型具有更好的空间定位精度。我们表明,将点和对象组合非常有趣,可以解决相机姿势恢复的问题。我们的主要贡献是:(1)我们使用高级对象地标提高了SLAM系统的重新定位能力; (2)我们构建了一个能够使用3D椭圆形识别,跟踪和重建对象的自动系统; (3)我们表明,基于对象的本地化可用于重新初始化或恢复相机跟踪。我们的全自动系统允许对象映射和增强姿势跟踪恢复,我们认为这可以极大地受益于AR社区。我们的实验表明,可以从经典方法失败的视点重新定位相机。我们证明,尽管跟踪损失损失,但这种本地化使SLAM系统仍可以继续工作,而这种损失可能会经常发生在不理会的用户中。我们的代码和测试数据在gitlab.inria.fr/tangram/oa-slam上发布。
translated by 谷歌翻译
增压树是主要的ML模型,表现出高度精度。但是,增压树几乎不可理解,每当将它们用于安全至关重要的应用中时,这都是一个问题。确实,在这种情况下,预期对所做预测的严格解释。最近的工作已经表明,如何使用自动推理技术来推导升压树的小节最小绑架解释。但是,在一般情况下,这种结合的解释的产生是棘手的。为了提高他们这一代的可扩展性,我们介绍了树木特定的解释的概念。我们表明,特定于树的解释是可以在多项式时间内计算的绑架解释。我们还解释了如何从特定于树的解释中得出亚群最小绑架性解释。各种数据集上的实验显示了利用树特定解释的计算益处,以得出亚群最小的绑架解释。
translated by 谷歌翻译
当测试数据与培训数据不同时,机器学习模型很容易失败,这种情况通常在称为分销转移的真实应用程序中遇到。尽管仍然有效,但培训时间知识的效率就降低了,需要进行测试时间适应以保持高性能。以下方法假设批处理层并使用其统计数据进行适应,我们提出了使用主成分分析(TTAWPCA)的测试时间适应,该测试时间假定拟合的PCA并在测试时间适应基于光谱过滤器,基于奇异的滤波器。 PCA可用于腐败的鲁棒性。 TTAWPCA结合了三个组件:使用主成分分析(PCA)分解给定层的输出,并通过其单数值的惩罚过滤,并用PCA逆变换重建。与当前方法相比,这种通用增强功能增加的参数少。在CIFAR-10-C和CIFAR-100-C上进行的实验证明了使用2000参数的唯一滤波器的有效性和限制。
translated by 谷歌翻译